Learning Zone

Courses@HKUST

EESM5060 Embedded Systems
ELEC6910A First Principles of CV

Books

Digital Integrated Circuits, A Design Perspective. Second Edition
CMOS VLSI Design, A Circuits and Systems Perspective. Fourth Edition
Verilog Digital System Design. Second Edition
Computer Architecture, A Quantitive Approach. Sixth Edition
动手学深度学习 Release 2.0.0-beta1
神经网络加速器的计算架构及存储优化技术研究 (Thanks to Prof. Fengbin TU who is the author and gifts me this book.)

Papers

AutoDCIM: An Automated Digital CIM Compiler
Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks
DaDianNao: A Machine-Learning Supercomputer
Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory
Reconfigurability, Why It Matters in AI Tasks Processing: A Survey of Reconfigurable AI Chips
Stripes: Bit-Serial Deep Neural Network Computing
A 4nm 6163-TOPS/W/b 4790-TOPS/mm2/b SRAM Based Digital-Computing-in-Memory Macro Supporting Bit-Width Flexibility and Simultaneous MAC and Weight Update
An 89TOPS/W and 16.3TOPS/mm2 All-Digital SRAM-Based Full-Precision Compute-In Memory Macro in 22nm for Machine-Learning Edge Applications
A 5-nm 254-TOPS/W 221-TOPS/mm2 Fully-Digital Computingin-Memory Macro Supporting Wide-Range Dynamic-Voltage-Frequency Scaling and Simultaneous MAC and Write Operations
A 12nm 121-TOPS/W 41.6-TOPS/mm2 All Digital Full Precision SRAM-based Compute-in-Memory with Configurable Bit-width For AI Edge Applications
An Ultra-Low-Voltage Bit-Interleaved Synthesizable 13T SRAM Circuit
All-Digital Time-Domain Compute-in-Memory Engine for Binary Neural Networks With 1.05 POPS/W Energy Efficiency
Multi-Function CIM Array for Genome Alignment Applications built with Fully Digital Flow
Compiling All-Digital-Embedded Content Addressable Memories on Chip for Edge Application
AI SoC Design in Foundation Model era
Benchmark and Modelling for SRAM based CIM
A Survey of Accelerator Architecture for DNNs
DIMC: 2219TOPS/W 2569F2/b Digital In-Memory Computing Macro in 28nm Based on Approximate Arithmetic Hardware
A 28nm 38-to-102-TOPS/W 8b Multiply-Less Approximate Digital SRAM Compute-In-Memroy Macro for Neural-Network Inference
Approximate De-randomizer for Stochastic Circuits
Algorithm-Software-Hardware Co-Design for Deep Learning Acceleration
Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks
Timeloop: A Systematic Approach to DNN Accelerator Evaluation
DynaPlasia: An eDRAM In-Memory-Computing-Based Reconfigurable Spatial Accelerator with Triple-Mode Cell for Dynamic Resource Switching
A 28nm 11.2TOPSW Hardware-Utilization-Aware Neural-Network Accelerator with Dynamic Dataflow
Understanding Reuse, Performance, and Hardware Cost of DNN Dataflows: A Data-Centric Approach (MAESTRO)
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
MNSIM 2.0: A Behavior-Level Modeling Tool for Processing-In-Memory Architectures
Towards Heterogeneous Multi-core Accelerators Exploiting Fine-grained Scheduling of Layer-Fused Deep Neural Networks
DIANA: An End-to-End Energy-Efficient DIgital and ANAlog Hybrid Neural Network SoC.
MARS: Multimacro Architecture SRAM CIM-Based Accelerator With Co-Designed Compressed Neural Networks
Scalable and Programmable Neural Network Inference Accelerator Based on In-Memory Computing
Fused-Layer CNN Accelerators
Automatic Generation of Structured Macros Using Standard Cells ‒ Application to CIM

Kunming SHAO

Courses@HKUST

Books

Papers